In this document, we’ll go through the many many different steps I’ve taken in an attempt to analyze several years of waterbird surveys at Estero La Cruz in Bahía de Kino, Sonora. Rather than only include the final model and results, I started this document when I began working with the data– so this document has evolved and grown alongside me as I worked through the various problems/errors/roadbloacks I encountered. While initally working with the data, I tried to use multiple variables to take advantage of the various data we collect at each survey. However, I’ve learned a lot about modeling and ultimately found myself following the advice to keep it all as simple as possible. By the end of this document, you’ll see that a lot of time and effort has been put into this to address a simple question- Has the number of shorebirds at Estero La Cruz changed over time?
The first steps are loading all required packages and of course—the data. This may seem like a long list of packages, but I’ve only added in the important ones that I know are being applied. Any additional packages that those on this list are dependent upon will also be loaded.
library(dplyr)
library(ggplot2); theme_set(theme_linedraw())
library(ggpubr)
library(ggeffects)
library(gridExtra)
library(stringr)
library(lme4)
library(MASS)
library(stringr)
library(lubridate)
library(tidyr)
library(readr)
library(RColorBrewer)
library(MuMIn)
library(patchwork)
library(pscl)
library(StatisticalModels)
library(numDeriv)
library(multcomp)
library(AICcmodavg)
library(sjPlot)All the necessary files should be saved in the working directory. The files we’ll Counts of birds in each guild for each survey at Estero La Cruz from 2013/14 to present cycle and these are “environment variables” taken at each survey. Similar to the biodiversity
guild_totals <- read_csv("guild.totals.csv",
col_types = cols(Date = col_date(format = "%m/%d/%y")))
enviro <- read_csv("current_enviro.csv",
col_types = cols(Date = col_date(format = "%m/%d/%y")),
na = "0")Here’s a sneak peak of the data–as you can see it’s set up with each survey effort as a row with every guild as a column.
| Date | shore | gulls_terns | pel_corm | land |
|---|---|---|---|---|
| 2013-09-12 | 867 | 315 | 286 | 12 |
| 2013-09-27 | 2129 | 1996 | 363 | 4 |
| 2013-10-11 | 793 | 587 | 506 | 6 |
| 2013-10-25 | 1243 | 1508 | 459 | 1 |
| 2013-11-07 | 2704 | 1473 | 4527 | 6 |
Next, we’ll do a bit of cleaning of the data. To up the environmental variables, we’ll parse out the “tide” variable into “tide_height” and “tide_dir”. This pipe code comes from the code Abram sent me.
enviro_clean <- enviro %>%
mutate(tide_height=str_extract(Tide,"mid|low|high|Mid|Low|High"),
tide_height=ifelse(is.na(tide_height),"unk",tide_height),
tide_height=tolower(tide_height),
tide_dir=str_extract(Tide,"slack|rising|falling"),
tide_dir=ifelse(is.na(tide_dir),"slack",tide_dir))
##pull out the enviroment factors/variable
a <- enviro_clean[,3:11]
##pull out data/time to potentially use
date.1 <- guild_totals$Date
mes <- format.Date(date.1, "%m")
mes <- as.double(mes)
mes.yr <- format.Date(date.1,"%m/%y")
##bring it all together into the same data frame
guild_totals <- cbind(guild_totals, a)
guild_totals <- cbind(guild_totals, mes)
guild_totals <- cbind(guild_totals, mes.yr)
##now remove "summer" data because sampling isn't consistent
guild_totals <- filter(guild_totals, szn != "4")
##later there are some problems because it's an interger
##so i made it a double instead of a factor
guild_totals$Temp <- as.double(guild_totals$Temp)
guild_totals$mes <- as.double(guild_totals$mes)To start, I made some boxplots to at each of the guilds, grouped by field season (temporada) and season (estación):
Also, some background- I have divided the seasons as 1-Fall 2-Winter 3-Spring—these seasons were decided as how to split the data because of migration. Someone had suggested maybe grouping fall and spring (again because migration) and later I end up doing it. In my efforts of working/testing/exploring models, it became apparent that springs and fall were pretty consistently similar enough that later, I did end of grouping them.
Note that in the data the variable “temporada” is actually the field season that runs fall-summer. So “20” is the fall 2019-present, “19” is fall 2018-spring 2019, etc.
Here is some of my initial messy exploration/attempts at modeling to look for trends/changes in shorebird abundance over time. Because we’re working with count data, I started with a Poisson distribution. From what I’ve read, I found that overdispersion is a common problem when modeling count data so I kept that in mind.
shore_pois_mod <- glm(shore~temporada+szn+mes+Temp+tide_height+tide_dir,
data=guild_totals, family = poisson)
##one way to check for overdispersion is to run a chi-squared
pchisq(summary(shore_pois_mod)$deviance, summary(shore_pois_mod)$df.residual)## [1] 1
##or you can just divide the deviance of the residuals by df residuals
(shore_pois_mod$deviance)/((shore_pois_mod)$df.residual)## [1] 1569.193
So both should be close to 1. Not 1 or 309… Our output of 309 means there is a lot of overdispersion in the model, or that there is a lot of variation unaccountated for through the model. This is because in a poisson distribution the mean and the variance are supposed to be equal. Overdispersion means that the variance is much greater than the mean.
So next I tried to simplify the model and look just at seasonality…
shore_pois_mod1 <- glm(shore~szn, data=guild_totals, family = poisson(link="log"))
summary(shore_pois_mod1)##
## Call:
## glm(formula = shore ~ szn, family = poisson(link = "log"), data = guild_totals)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -74.587 -43.110 -23.716 8.658 171.421
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.156518 0.006282 1298.36 <2e-16 ***
## szn -0.212637 0.002826 -75.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 281954 on 112 degrees of freedom
## Residual deviance: 276346 on 111 degrees of freedom
## AIC: 277346
##
## Number of Fisher Scoring iterations: 5
## [1] 1
Clearly, still dealing with some overdispersion problems so next I tried running a glm but this time with a quasipoisson distribution.
shore_qpois_mod <- glm(shore~szn+Temp+mes, data=guild_totals, family = quasipoisson)
summary(shore_qpois_mod)##
## Call:
## glm(formula = shore ~ szn + Temp + mes, family = quasipoisson,
## data = guild_totals)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -67.26 -39.96 -21.33 10.09 146.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.31364 0.82511 12.500 < 2e-16 ***
## szn -0.44928 0.18406 -2.441 0.01629 *
## Temp -0.01674 0.01156 -1.448 0.15054
## mes -0.08600 0.03088 -2.785 0.00633 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for quasipoisson family taken to be 2844.066)
##
## Null deviance: 281311 on 110 degrees of freedom
## Residual deviance: 238372 on 107 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: NA
##
## Number of Fisher Scoring iterations: 5
## [1] 2227.779
Once again, I’m stll having problems with the model being quite overdispersed. So next I tried a negative binomial model.
shore_nb_mod <- glm.nb(shore~szn+Temp+mes+tide_height+tide_dir, data=guild_totals, link = "log")
summary(shore_nb_mod)##
## Call:
## glm.nb(formula = shore ~ szn + Temp + mes + tide_height + tide_dir,
## data = guild_totals, link = "log", init.theta = 1.037219293)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3761 -1.0425 -0.4420 0.2677 1.9843
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 11.51790 0.80757 14.262 < 2e-16 ***
## szn -0.45625 0.16444 -2.775 0.00553 **
## Temp -0.03536 0.01066 -3.317 0.00091 ***
## mes -0.06486 0.02797 -2.319 0.02040 *
## tide_heightlow -1.05641 0.32682 -3.232 0.00123 **
## tide_heightmid -0.30532 0.32105 -0.951 0.34161
## tide_heightunk -0.88247 0.75515 -1.169 0.24257
## tide_dirrising 0.18163 0.28887 0.629 0.52952
## tide_dirslack 0.61548 0.28815 2.136 0.03268 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(1.0372) family taken to be 1)
##
## Null deviance: 168.11 on 110 degrees of freedom
## Residual deviance: 127.54 on 102 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 1912.7
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 1.037
## Std. Err.: 0.123
##
## 2 x log-likelihood: -1892.659
## [1] 1.250395
This is looking a lot better, and actually there is no longer the very high degree of overdispersion which is great. Looking back on So lets check the residuals real quick:
Not too terrible… From these plots of the residuals, there is a bit of a pattern to them, as well as a few data points that jump out (30 and 47). Though it does raise some flags, it looks decent. But rather than being satisfied with this, I kept exploring.
A zero-inflated model seemingly is another route to go, but that wouldn’t work or make any sense at all given I don’t have any zeros in this data. Though there are varying opinions for and against it, I thought maybe I should try transforming the data. This seems like a valid thing to given the lack of zeros in this data and that seemed to be one big argument against transforming the data. I checked it out, knowing this might work with the shorbird guild. While some of the other surveys of guilds do have zeros, from my understanding there is not an amount that might call for a zero-inflated or truncated poisson model. But that might be a later conversation with a statistician.
Given the data looks a lot closer to a normal distribution, I decided to see what would happen if I tried running the models with transformed data. So I tried just a classic gaussian distribution glm.
logshore_mod <- glm(log(shore)~temporada+szn+mes+Temp+tide_height+tide_dir, data=guild_totals)
summary(logshore_mod)##
## Call:
## glm(formula = log(shore) ~ temporada + szn + mes + Temp + tide_height +
## tide_dir, data = guild_totals)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.0272 -0.6739 0.0511 0.7138 2.1781
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.67259 2.01238 10.273 < 2e-16 ***
## temporada -0.39973 0.07894 -5.064 1.85e-06 ***
## szn -0.51013 0.18946 -2.693 0.0083 **
## mes -0.03108 0.03283 -0.947 0.3461
## Temp -0.08332 0.01392 -5.986 3.30e-08 ***
## tide_heightlow -0.15059 0.38954 -0.387 0.6999
## tide_heightmid 0.29917 0.37186 0.805 0.4230
## tide_heightunk 0.58505 0.87764 0.667 0.5065
## tide_dirrising 0.10734 0.33373 0.322 0.7484
## tide_dirslack -0.15400 0.35732 -0.431 0.6674
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.273648)
##
## Null deviance: 211.48 on 110 degrees of freedom
## Residual deviance: 128.64 on 101 degrees of freedom
## (2 observations deleted due to missingness)
## AIC: 353.37
##
## Number of Fisher Scoring iterations: 2
## [1] 0.9668983
This is also looking pretty decent and lke a potential route to follow. And it might be better because I’ll get lower AICc values but I’m not sure if I’ll be able to replicate this with the other guilds. And now let’s check out the residuals:
Again, two data points are jumping out of the data (37 and 57)….so gonna take a look at the data… Let’s see if the data is actually distributed normally by running a Shapiro-Wilk Normality test.
##
## Shapiro-Wilk normality test
##
## data: guild_totals$shore
## W = 0.72117, p-value = 2.363e-13
YIKES. Now lets check the log-transformed data.
##
## Shapiro-Wilk normality test
##
## data: log(guild_totals$shore)
## W = 0.96505, p-value = 0.004724
Ok so that’s also not dristributed normally. Soo with some fresh thoughts, I went to maybe trying to use monthly means.
I calculated the mean number of shorebirds per month for each year. This might work because there are typically two surveys per month. But one reason it might is every number will end with .5 or an interger.
##calc mean # of shorebirds per month for each year
##recall that mes.yr was date formated as "%m/%y"
##so I used that to calculate the means
shore.mean <- aggregate(shore ~ mes.yr, guild_totals, FUN=mean)
##wrote the output into a csv because it seemed easier to combine
write_csv(shore.mean, "shoremeans.csv")
write_csv(enviro_clean, "cleanedenviro.csv")
##then I did the same thing with the "avg temp" per month for each year
##NOTE: still waiting on the CICESE data...
options(na.action = "na.omit")
temp.mn <- aggregate(Temp ~ mes.yr, guild_totals, FUN=mean)
write_csv(temp.mn, "tempmean.csv")
##combined data outside in excel--oooops--and then brough it back in
shoremeans <- read_csv("shoremeans.full.csv")## Parsed with column specification:
## cols(
## mes.yr = col_character(),
## shore = col_double(),
## temp = col_double(),
## ciclo = col_double(),
## szn = col_double(),
## mes = col_double(),
## year = col_double()
## )
##instead of opening the whole data frame, just check the column names
##quick way to cheat and see the variables
colnames(shoremeans)## [1] "mes.yr" "shore" "temp" "ciclo" "szn" "mes" "year"
So next I ran some glms wth these means.
##
## Call:
## glm(formula = shore ~ mes + ciclo + szn + temp, data = shoremeans)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -954.04 -216.36 -71.63 206.30 1307.65
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7128.676 1528.669 4.663 3.78e-05 ***
## mes -16.668 20.383 -0.818 0.418612
## ciclo -162.784 58.600 -2.778 0.008453 **
## szn -217.960 124.872 -1.745 0.088984 .
## temp -41.648 9.977 -4.174 0.000168 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 240965.4)
##
## Null deviance: 14275232 on 42 degrees of freedom
## Residual deviance: 9156684 on 38 degrees of freedom
## AIC: 661.59
##
## Number of Fisher Scoring iterations: 2
This model is looking like a pretty poor fit. My thought is that because even though now we are working with means, the data is still mostly integers. So I tried the same thing, but with a gamma distribution, mostly because it’s a different distribution.
##
## Call:
## glm(formula = shore ~ mes + ciclo + szn + temp, family = Gamma,
## data = shoremeans)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.47927 -0.41518 0.01433 0.21005 1.12005
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.729e-03 1.590e-03 -3.604 0.000897 ***
## mes 1.717e-05 1.927e-05 0.891 0.378291
## ciclo 1.700e-04 6.038e-05 2.815 0.007682 **
## szn 3.308e-04 1.984e-04 1.667 0.103767
## temp 4.759e-05 1.137e-05 4.187 0.000162 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.3100687)
##
## Null deviance: 21.647 on 42 degrees of freedom
## Residual deviance: 15.538 on 38 degrees of freedom
## AIC: 655.36
##
## Number of Fisher Scoring iterations: 5
Ok ok looking better but this can maybe be cleaner. So I tried transforming the means and doing it again. First I tried square-root transformation.
##decided to play with it a bit since the data is distributed better when transformed
summary(glm(sqrt(shore)~ciclo+szn+temp, data=shoremeans))##
## Call:
## glm(formula = sqrt(shore) ~ ciclo + szn + temp, data = shoremeans)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -18.1675 -4.5416 0.5746 3.1148 15.6001
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 133.3573 24.9214 5.351 4.11e-06 ***
## ciclo -2.6959 0.9488 -2.841 0.00711 **
## szn -3.4310 1.8177 -1.888 0.06654 .
## temp -0.7485 0.1608 -4.655 3.70e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 64.0638)
##
## Null deviance: 4026.6 on 42 degrees of freedom
## Residual deviance: 2498.5 on 39 degrees of freedom
## AIC: 306.71
##
## Number of Fisher Scoring iterations: 2
This doesn’t look great- just look at the deviance. So then I tried a log transformation.
##
## Call:
## glm(formula = log(shore) ~ ciclo + szn + temp + mes, data = shoremeans)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8739 -0.2273 0.1177 0.4170 1.0157
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.00221 2.09164 7.172 1.44e-08 ***
## ciclo -0.20641 0.08018 -2.574 0.0141 *
## szn -0.35234 0.17086 -2.062 0.0461 *
## temp -0.05979 0.01365 -4.380 9.01e-05 ***
## mes -0.00876 0.02789 -0.314 0.7552
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.4511314)
##
## Null deviance: 27.580 on 42 degrees of freedom
## Residual deviance: 17.143 on 38 degrees of freedom
## AIC: 94.485
##
## Number of Fisher Scoring iterations: 2
Also, I went back to the begining of Abram’s “glm_esteros” code and saw the gamma distribution glm. So I tried that and got some surprising results. Will may bring that into all this later. And actually this is looking pretty decent. So lets take a quick look at the residuals.
Just from the residuals, definitely looks like there are some patterns in there- soooo maybe not the best fit, even if some of the other parts of modeling are seemingly working out better. Plus, the qq-plot isn’t too convincing.
Great so we’ve got some models, Maybe they aren’t the best but it’s what I have to work with so far. Now we need to select and compare them. One way is to use inforation criteria such as the AICc. But there are also some neat ways to automate it. At the end of Abram’s code, I found the function dredge() so I looked into it. Here I’m using the “shoremeans” file I just created above.
## [1] 101.0197
##but this just spits out the calculated AICc from a given model
##this is cool but we can do better to select a model...
##first I saved the model with the log-transformed mean data
log.mod <- (glm(log(shore)~ciclo+temp+szn+mes, data=shoremeans)) This new model is the same as performed above whose residuals are shown. So now that we have the model saved, we can use the dredge() function to get a model selection table. When I was initially running the function, I got some errors. With help of interwebs and help(), I realized that I needed to change the way R deals with NA values. So before using that function, make sure the code options(na.action = "na.fail") runs—or else you’ll get an errror.
options(na.action = "na.fail") ##handling NA values
d1 <- dredge(log.mod, extra = "R^2") ##you can also add extras like R^2 values## Fixed term is "(Intercept)"
## Global model call: glm(formula = log(shore) ~ ciclo + temp + szn + mes, data = shoremeans)
## ---
## Model selection table
## (Intrc) ciclo mes szn temp R^2 df logLik AICc delta
## 14 14.990 -0.2094 -0.3282 -0.06044 0.3768 5 -41.298 94.2 0.00
## 10 13.830 -0.1938 -0.05751 0.3011 4 -43.764 96.6 2.36
## 16 15.000 -0.2064 -0.00876 -0.3523 -0.05979 0.3784 6 -41.243 96.8 2.60
## 12 13.980 -0.2019 0.01715 -0.05920 0.3089 5 -43.523 98.7 4.45
## 13 9.983 -0.2918 -0.04173 0.2637 4 -44.884 98.8 4.60
## weight
## 14 0.559
## 10 0.172
## 16 0.152
## 12 0.060
## 13 0.056
## Models ranked by AICc(x)
Pretty neat. Now let’s save another model that we had check earlier– the gamma distribution. Feel free to scroll up to check the output of the model.
shoremn.mod <- glm(shore~szn+temp+mes+ciclo,data=shoremeans, family = Gamma)
mod_mn_dredge<-dredge(shoremn.mod, extra = "R^2")
subset(mod_mn_dredge, delta < 3)## Global model call: glm(formula = shore ~ szn + temp + mes + ciclo, family = Gamma,
## data = shoremeans)
## ---
## Model selection table
## (Intrc) ciclo mes szn temp R^2 df logLik AICc
## 10 -0.004841 0.0001664 4.688e-05 0.2537 4 -323.017 655.1
## 14 -0.005476 0.0001683 0.0002793 4.741e-05 0.2867 5 -322.044 655.7
## 12 -0.004942 0.0001681 8.020e-06 4.721e-05 0.2565 5 -322.938 657.5
## 16 -0.005729 0.0001700 1.717e-05 0.0003308 4.759e-05 0.2987 6 -321.681 657.7
## 9 -0.001070 3.352e-05 0.1557 3 -325.670 658.0
## delta weight
## 10 0.00 0.393
## 14 0.62 0.288
## 12 2.41 0.118
## 16 2.61 0.107
## 9 2.87 0.094
## Models ranked by AICc(x)
The dredge() function works with various types of model. For example, it will also work with negative binomial glms, so here we’ll use the shore_nb_mod from earlier–aka one of the models whose residuals were graphed above.
##
## Call: glm.nb(formula = shore ~ szn + Temp + mes + tide_height + tide_dir,
## data = guild_totals, link = "log", init.theta = 1.037219293)
##
## Coefficients:
## (Intercept) szn Temp mes tide_heightlow
## 11.51790 -0.45625 -0.03536 -0.06486 -1.05641
## tide_heightmid tide_heightunk tide_dirrising tide_dirslack
## -0.30532 -0.88247 0.18163 0.61548
##
## Degrees of Freedom: 110 Total (i.e. Null); 102 Residual
## (2 observations deleted due to missingness)
## Null Deviance: 168.1
## Residual Deviance: 127.5 AIC: 1913
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #0 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #1 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #2 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #3 [113] different from that in global model [111]
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 4 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 5 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 6 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 7 skipped)
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #8 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #9 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #10 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #11 [113] different from that in global model [111]
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 12 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 13 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 14 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 15 skipped)
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #16 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #17 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #18 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #19 [113] different from that in global model [111]
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 20 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 21 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 22 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 23 skipped)
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #24 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #25 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #26 [113] different from that in global model [111]
## Warning in dredge(shore_nb_mod, extra = "R^2"): number of observations in model
## #27 [113] different from that in global model [111]
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 28 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 29 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 30 skipped)
## Warning in na.fail.default(structure(list(shore = c(867, 2129, 793, 1243, :
## missing values in object (model 31 skipped)
## Global model call: glm.nb(formula = shore ~ szn + Temp + mes + tide_height + tide_dir,
## data = guild_totals, link = "log", init.theta = 1.037219293)
## ---
## Model selection table
## (Int) mes szn tid_dir tid_hgh R^2 df logLik AICc delta
## 20 9.592 -0.07122 -0.5178 + 0.1936 7 -969.127 1953.3 0
## 28 9.354 -0.07100 -0.5486 + + 0.2125 9 -967.784 1955.3 2
## weight
## 20 0.731
## 28 0.269
## Models ranked by AICc(x)
##from the same package, you can also do some model averaging
summary(model.avg(mod_nb_dredge, subset = delta <4))##
## Call:
## model.avg(object = mod_nb_dredge, subset = delta < 4)
##
## Component model call:
## glm.nb(formula = shore ~ <2 unique rhs>, data = guild_totals, link =
## log, init.theta = 1.037219293)
##
## Component models:
## df logLik AICc delta weight
## 124 7 -969.13 1953.32 0 0.73
## 1234 9 -967.78 1955.32 2 0.27
##
## Term codes:
## mes szn tide_dir tide_height
## 1 2 3 4
##
## Model-averaged coefficients:
## (full average)
## Estimate Std. Error Adjusted SE z value Pr(>|z|)
## (Intercept) 9.527770 0.523135 0.528913 18.014 < 2e-16 ***
## mes -0.071158 0.028167 0.028491 2.498 0.01251 *
## szn -0.526081 0.166354 0.168253 3.127 0.00177 **
## tide_heightlow -0.947298 0.331006 0.334728 2.830 0.00465 **
## tide_heightmid -0.249012 0.315010 0.318494 0.782 0.43431
## tide_heightunk -0.783498 0.662821 0.670439 1.169 0.24255
## tide_dirrising 0.009135 0.148663 0.150378 0.061 0.95156
## tide_dirslack 0.094990 0.212831 0.213975 0.444 0.65709
##
## (conditional average)
## Estimate Std. Error Adjusted SE z value Pr(>|z|)
## (Intercept) 9.52777 0.52313 0.52891 18.014 < 2e-16 ***
## mes -0.07116 0.02817 0.02849 2.498 0.01251 *
## szn -0.52608 0.16635 0.16825 3.127 0.00177 **
## tide_heightlow -0.94730 0.33101 0.33473 2.830 0.00465 **
## tide_heightmid -0.24901 0.31501 0.31849 0.782 0.43431
## tide_heightunk -0.78350 0.66282 0.67044 1.169 0.24255
## tide_dirrising 0.03391 0.28495 0.28828 0.118 0.90636
## tide_dirslack 0.35261 0.27804 0.28128 1.254 0.20999
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the code above, I turned off the warning messages as many warning messages as I was finding out that negative binomal models can be a bit tricky to work with. Ok this was cool and some good practice in modeling, but I still hadn’t necessarily found the best way to model my data that stays true to the biology. And thus I entered…
I started playing with the idea of mixed models because they are a common tool used in biological modeling. Given the data we’re wworking with and question we’re trying to answer, the random effect that needs to be taken into account with this modeling is the observer(s). With several years of data and different people counting birds over the years, that’s probably one of the biggest sources of variation in the data—and hence random effect of the observer(s)!!
options(na.action = "na.omit")
##add in the year/cycle as a factor
guild_totals$ciclo <- as.factor(guild_totals$temporada)
##another way to do this would be to write factor(temporada) in the formula
##but I just added it in here since it was easier
glmer.s <- glmer(shore ~ temporada + szn + tide_height + tide_dir +
mes + (1 | obs),data = guild_totals,family="poisson")
glmer.s1 <- glmer(shore ~ ciclo + szn + tide_height + tide_dir +
mes + (1 | obs),data = guild_totals, family="poisson")I removed warnings and messages from above to save some space. But what I can tell you is there are pretty trash models, regardless of whether or not the cycle is a read as a factor or a continuous vaariable. I was testing the model in this way because I was unsure of how year should be read. I’ve determined that in this, I’m not trying to compare each year to one another but rather look for an overall trend over time.
Singluarity, convergence, and just lots of problems and headaches that can be avoided. What I’ve done to troubleshoot is going into some different directions. Just for shits and giggles though, let’s go ahead and run a quick line to see how the automated model selection might pick the best model.
## Global model call: glmer(formula = shore ~ ciclo + szn + tide_height + tide_dir +
## mes + (1 | obs), data = guild_totals, family = "poisson")
## ---
## Model selection table
## (Int) ccl mes szn tid_dir tid_hgh df logLik AICc delta weight
## 32 10.27 + -0.06983 -0.6141 + + 15 -51473.9 102982.7 0 1
## Models ranked by AICc(x)
## Random terms (all models):
## '1 | obs'
This is still a bit of a headache. But thats ok. Just as a refresher so far I’ve tried GLMs of all sorts (poisson, quasipoisson, negative binomials, log-transformed gaussian) and now we’ve added a poisson GLMER. Since working with the data thus far has seemed to best fit the negative binomial distribution, lets used that in our mixed model.
glmernb.shore1 <- glmer.nb(shore ~ temporada + season + (1|obs), data=guild_totals)
summary(glmernb.shore1)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.3551) ( log )
## Formula: shore ~ temporada + season + (1 | obs)
## Data: guild_totals
##
## AIC BIC logLik deviance df.resid
## 1929.1 1945.4 -958.5 1917.1 107
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.1564 -0.6659 -0.1840 0.3693 3.5382
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2997 0.5474
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 10.73328 1.25467 8.555 < 2e-16 ***
## temporada -0.22363 0.07289 -3.068 0.00215 **
## season2 0.87871 0.27830 3.157 0.00159 **
## season3 -0.31475 0.29344 -1.073 0.28343
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd seasn2
## temporada -0.978
## season2 -0.206 0.041
## season3 -0.275 0.115 0.735
## $residDev
## [1] 90.17925
##
## $residDF
## [1] 108
##
## $ratio
## [1] 0.8349931
##
## $P.ChiSq
## [1] 0.8925984
Alright- it looks like we aren’t dealing with an absurd amount of overdispersion that was previously a problem.
Now let’s take a look at ’em.
##put all the imporant things into a data frame
resid_shore1<-data.frame(
pearsons_residuals=resid(glmernb.shore1,type="pearson"),
deviance_residuals=resid(glmernb.shore1,type="deviance"),
response_residuals=guild_totals$shore-predict(glmernb.shore1,type = "response"),
predicted=predict(glmernb.shore1,type = "response"),
observed=guild_totals$shore)
##plot predictions and resids
ggplot(resid_shore1,aes(x=predicted,y=pearsons_residuals))+
geom_point() +
ggplot(resid_shore1,aes(x=predicted,y=deviance_residuals))+
geom_point()+
ggplot(resid_shore1,aes(x=predicted,y=response_residuals))+
geom_point()+
ggplot(resid_shore1,aes(x=predicted,y=observed))+ geom_point()Also note that that code for these plots was more or less copy/pasted from Abram’s code
Similar to previous models, this isn’t necessarily the best looking set of residual graphs, but for now we can work with it.
I have only begun looking into the world of testing models and hypotheses. One method I encountered to test general hypotheses was an example I found on the internet proved useful. Essentially what I was interested in testing was whether or not there are differences between the seasons. This sets up a pairwise comparison between factors using a Tukey test.
posthoc1 <- glht(glmernb.shore1, linfct = mcp(season="Tukey"))
mcs <- summary(posthoc1, test=adjusted("single-step"))
mcs ##call it to look at the pairwise comparison ##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: glmer(formula = shore ~ temporada + season + (1 | obs), data = guild_totals,
## family = MASS::negative.binomial(theta = 1.35510116706339))
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## 2 - 1 == 0 0.8787 0.2783 3.157 0.00439 **
## 3 - 1 == 0 -0.3148 0.2934 -1.073 0.52654
## 3 - 2 == 0 -1.1935 0.2088 -5.717 < 1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
This is pretty neat. It shows which factors are siginificantly different from one another and which are similar enough to be the same. Since there are only 3 factors in the variable of season, the output is pretty small and easy to read. But imagine if there were 10 different factors of the variable? Could get pretty messy pretty quickly—so another way to look at each factor is to use letters. This is easier to follow and thus more accessible to interpret determine whether factors were similar or significantly different.
## 1 2 3
## "b" "a" "b"
So what all that means is that spring and fall are similar to one another (hence the same letter) but winter is different. This makes sense given there is an influx of migrants in the winter and birds similarly moving through or completely absent in the fall and spring. In the OG data, 1=fall, 2=winter, 3=spring. Below I’ll change the data so that fall/spring=1 and keep 2=winter using a simple ifelse() command.
guild_totals$season <- ifelse(guild_totals$season != 2, 1, 2)
guild_totals$season <- as.factor(guild_totals$season)Now that the data is combined, let’s take a new look at it. Here I used more or less the same code as the boxplots at the beginning of the document.
And just a refresher/overview, let’s look at the data, but rather than being divided by season, it is only divided by the work cycle/year (“temporada”).
sb <- ggplot(guild_totals, aes(factor(temporada), shore, group=temporada))
sb + geom_boxplot() + labs(title = "Shorebird Counts by Season") +
theme(plot.title = element_text(hjust = 0.5))Since I haven’t been able to get the automated model selection dredge() to work with the negative binomial glmer, there are some other ways to do what I want to do and compare the models. It’s not quite as pretty but as far as I understand model selection, it makes sense. First we’ll make a null model and go through some steps to change/add variables to our models.
null.shore <- glmer.nb(shore ~ 1 + (1|obs), data=guild_totals)
##and start with a simple model containing what we want to test
glmernb.shore1 <- glmer.nb(shore ~ temporada + season + (1|obs), data=guild_totals)
summary(glmernb.shore1)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.3527) ( log )
## Formula: shore ~ temporada + season + (1 | obs)
## Data: guild_totals
##
## AIC BIC logLik deviance df.resid
## 1928.2 1941.9 -959.1 1918.2 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.1532 -0.6477 -0.2305 0.3518 3.0888
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.3156 0.5618
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 10.35574 1.23492 8.386 < 2e-16 ***
## temporada -0.21397 0.07405 -2.890 0.00386 **
## season2 1.09239 0.18936 5.769 7.98e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.991
## season2 -0.017 -0.053
Just from an intial look at this model summary, it looks like there is a significant decline in the abundance of shorebirds over time. It also appears, as expected, that there is a significant increase in in the winter months when compared to the rest of the year. Before we saying anything more, let’s do a handful of other steps to check the model and make sure this will be our final model.
## $residDev
## [1] 87.99499
##
## $residDF
## [1] 109
##
## $ratio
## [1] 0.8072935
##
## $P.ChiSq
## [1] 0.9305589
## Data: guild_totals
## Models:
## null.shore: shore ~ 1 + (1 | obs)
## glmernb.shore1: shore ~ temporada + season + (1 | obs)
## npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
## null.shore 3 1958.4 1966.6 -976.19 1952.4
## glmernb.shore1 5 1928.2 1941.9 -959.12 1918.2 34.148 2 3.845e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since there is a significant difference, this means we can reject the null model and accept the simple negative binomial mixed model as a betterr fit to the data. I also feel OK about that given the AICc/BIC are slightly lower, and the log-Liklihood has moved a bit closer to 0 (which is another one of many things to look for).
There are a wide variety of criteria and methods that are used to select models. Here, I go through a few different Now let’s add in some other potential variables that would affect the presence of birds, in this case, the tide!
##add tide height?
glmernb.shore2 <- glmer.nb(shore ~ temporada + season + tide_height + (1|obs),
data=guild_totals)
##add tide dir?
glmernb.shore3 <- glmer.nb(shore ~ temporada + season + tide_dir + (1|obs),
data=guild_totals)
##add both tide variables?
glmernb.shore4 <- glmer.nb(shore ~ temporada + season + tide_height + tide_dir +
(1|obs), data=guild_totals)We can make a neat model selection table based on information selection criteria. Here we will use the second order AIC (AICc) for comparison. For the model names, I abbreviated the names into “modX” so note that “mod1” = “glmernb.shore1”, “mod2” = “glmernb.shore2”, etc…
shore.cand <- list(null.shore, glmernb.shore1, glmernb.shore2, glmernb.shore3, glmernb.shore4)
mod.names <- list(c("null", "mod1", "mod2", "mod3","mod4"))
aictab(shore.cand, mod.names) ##this exports our pretty html table below##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## mod1 5 1928.79 0.00 0.40 0.40 -959.12
## mod3 7 1929.16 0.37 0.33 0.74 -957.05
## mod2 8 1930.45 1.65 0.18 0.91 -956.53
## mod4 10 1931.81 3.02 0.09 1.00 -954.83
## null 3 1958.60 29.81 0.00 1.00 -976.19
Let’s do another check by comparing increasingly complex models using ANOVAs. If the Chi-squared value is significant, that means that the model is a better fit. We will set up comparisons between our simplest model (glmernb.shore1) and each of the increasingly complex models. While we can use the code to set up single comparisons between models (ie. anova(mod1, mod2)) we can also combine it into one line of code such as anova(mod1, mod2, mod3, ...) and we will get a table that compares our simplest model (mod1) with the rest of the models.
| npar | AIC | BIC | logLik | deviance | Chisq | Df | Pr(>Chisq) | |
|---|---|---|---|---|---|---|---|---|
| glmernb.shore1 | 5 | 1928.231 | 1941.868 | -959.1156 | 1918.231 | NA | NA | NA |
| glmernb.shore3 | 7 | 1928.096 | 1947.187 | -957.0479 | 1914.096 | 4.135420 | 2 | 0.1264751 |
| glmernb.shore2 | 8 | 1929.062 | 1950.881 | -956.5311 | 1913.062 | 1.033537 | 1 | 0.3093293 |
| glmernb.shore4 | 10 | 1929.656 | 1956.930 | -954.8280 | 1909.656 | 3.406285 | 2 | 0.1821103 |
What each of these show, is that with adding in the height and/or direction of the tide, we do not get a better fitting model. That means we should stick to the first (and most simple!) model as is the best fit.
Another potential approach to model selection would be to average the models. But rather than follow more coomplicated steps to do that, I think it’s OK to stick with this process of selecting the best model
Now that we have selected a model, one interesting piece of information about the model is checking the R^2 value. While this measure is commonly used in lms and glms it can be applied to mixed models. Here the function will calculate the R^2 values taking into account only the fixed effects (marginal) as well as all effects (conditionall) using 3 different methods. Here’s the explanataion taken from the help("r.squaredGLMM"):
Marginal R_GLMM² represents the variance explained by the fixed effects, and is defined as: \[R_GLMM(m)² = (σ_f²) / (σ_f² + σ_α² + σ_ε²)\] Conditional R_GLMM² is interpreted as a variance explained by the entire model, including both fixed and random effects, and is calculated according to the equation: \[R_GLMM(c)² = (σ_f² + σ_α²) / (σ_f² + σ_α² + σ_ε²)\] where σ_f² is the variance of the fixed effect components, σ_α² is the variance of the random effects, and σ_ε² is the “observation-level” variance.
So with that we can make a table of these values:
## R2m R2c
## delta 0.2803543 0.4955452
## lognormal 0.3210871 0.5675431
## trigamma 0.2281494 0.4032695
From the this, it looks like our model does a pretty decent job of fitting the data and explaining the variance. Given we’re modeling change over time and not taking into account other environmental variables that are likely affecting the presence/absence/abundance of birds, I feel okay about believing this model.
We’ll run more or less the same code as we used above. See the .rmd file for code. Recall that the difference now is that
Again, it looks more or less the same as before. Our residuals aren’t perfectly randomly distributed but I’m not sure what I could do at this point to get a better fit. Given the process to get to this model, I think it’s acceptable for the sake of this analysis.
Now it’s time to look at our model predictions. One of the most important aspects of a regression are the estimates of each variable in the model. Since we’re working with negative binomial models, there is a log link meaning the effect estimates need to be log transformed. We can do that easily by using R as a fancy calculator. Before we do that, let’s take a quick look at the different estimates of the model.
The first thing you might notice is that the intercept is very high- yes but it’s not as important. If we think about the data we’re modeling, there are counts of shorebirds in the thousands, and in the first year of data (2013-2014 season), there is a very high degree of variability.
What we really want to pay more attention to are the estimates of two fixed effects of the model. Just by looking at this grarph, the effect of season is a positive increase while that of temporada(which is the year/cycle) is negative.
If you recall from the summary output of the model the estimate of the variable temporada was: -0.2139745. To translate that into normal terms, as the year/cycle changes by 1, the change in shorebird abundance is multipled by a factor of 0.8073726. This can be interpreted into that “the abundance of shorebirds is declining by 19.2627394% each year.” One way we can visualize this is by plotting the predicted values on a graph. We can plot the overall trend of the model. Note that this will back-transform the predicted values to the original scale of our response variable.
However, we know there are differences between the season and that there was a significant effect in the model. From the model estimate of that variable, we can say winter season sees an increase of 298.1387504% in the number of shorebirds in comparison to the rest of the year. Knowing that the difference is so great, let’s take a look at the same graph as above, but grouped by season. Recall that the seasons are split into 1: Fall/Sping and 2: Winter.
Note that both graphs above are the same data, just plotted differently.
In both graphs, the seasons are split into 1 (Fall/Sping–the migration seasons) in purple and 2 (Winter) in blue. These colors will be consistent throughout the rest of this document. This shows that while shorebird abundance is declining in both seasons, it’s clear that the decline is more pronounced in the winter-we can visually see this as the slope of the line is steeper. This could potentially be due a decline in the migratory species overwintering, so it would be a good idea to look further into the migratory species.
Plot the random effects. And also a diagnostic plot: > For generalized linear mixed models, returns the QQ-plot for random effects.
So diagnostic plot is essentially just a qq plot
set_theme(base = theme_linedraw())
plot_model(glmernb.shore1, type="re", colors = "ipsum", vline.color = "red")## $obs
## `geom_smooth()` using formula 'y ~ x'
The first plot shows the random effects. The red line indicates no effect. The blue indicates a positive effect while purple is negtative. The second plot above, as mentioned, is a qqplot.
##show it as connected line
shore_resids <- cbind(resid_shore2, guild_totals$season)
shore_resids <- cbind(shore_resids, guild_totals$temporada)
shore_resids$temporada <- shore_resids$`guild_totals$temporada`
shore_resids$season <- shore_resids$`guild_totals$season`
shore_resids$szn <- as.factor(shore_resids$`guild_totals$season`)
ggplot(shore_resids, aes(guild_totals$temporada, predicted, colour = szn)) +
stat_summary(geom="smooth", fun.data=mean_cl_normal, width=0.1, conf.int=0.95) +
stat_summary(geom="line", fun=mean, linetype="dashed") +
stat_summary(geom="point", fun=mean) ## Warning: Ignoring unknown parameters: width, conf.int
Both during my model exploration with shorebirds as well as with other guilds, I ran into a handful of problems. One common problem I ran into was a failure for the model to converge. This convergence failure essentially means that the model fits the data extremely poorly, or the observations just don’t match up. But this isn’t the end of the world. For example, when I tried to apply a negative binomial mixed model to the waterfowl guild data, I ran into this problem. Below is an example of how I handled it.
##now the other guilds. #this outputs a lot of warnings and it isn't goood
glmernb.fowl1 <- glmer.nb(waterfowl ~ temporada + season + (1|obs), data=guild_totals)## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00631856 (tol = 0.002, component 1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(0.5939) ( log )
## Formula: waterfowl ~ temporada + season + (1 | obs)
## Data: guild_totals
##
## AIC BIC logLik deviance df.resid
## 1158.7 1172.4 -574.4 1148.7 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.7671 -0.6636 -0.3612 0.3186 3.0002
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2323 0.482
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.54422 1.55325 6.145 8.01e-10 ***
## temporada -0.36895 0.09658 -3.820 0.000133 ***
## season2 1.29676 0.30908 4.195 2.72e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.987
## season2 0.219 -0.318
Luckily, I found this neat guide that I followed and found helpful. Below I follow the steps of the aformentioned guide and fix the problem. What the steps below basically do is try to optimize the model, starting from the inital fit but increasing the number of iterations.
## [1] 0.4820253
derivs1 <- glmernb.fowl1@optinfo$derivs
sc_grad1 <- with(derivs1, solve(Hessian,gradient))
max(abs(sc_grad1))## [1] 8.6121e-06
## [1] 8.6121e-06
ss <- getME(glmernb.fowl1,c("theta","fixef"))
glmernb.fowl2 <- update(glmernb.fowl1,start=ss,control=glmerControl(optCtrl=list(maxfun=2e4)))No more failure to converge! So let’s check for overdispersion and look at the summary output of the model.
## $residDev
## [1] 81.82569
##
## $residDF
## [1] 109
##
## $ratio
## [1] 0.7506944
##
## $P.ChiSq
## [1] 0.9758231
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(0.5939) ( log )
## Formula: waterfowl ~ temporada + season + (1 | obs)
## Data: guild_totals
## Control: glmerControl(optCtrl = list(maxfun = 20000))
##
## AIC BIC logLik deviance df.resid
## 1158.7 1172.4 -574.4 1148.7 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -0.7671 -0.6636 -0.3613 0.3186 3.0001
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2323 0.482
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 9.54401 0.55262 17.270 < 2e-16 ***
## temporada -0.36894 0.03602 -10.242 < 2e-16 ***
## season2 1.29669 0.30533 4.247 2.17e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.888
## season2 -0.019 -0.281
It looks like there is a significcant decline in the abundance of waterfowl over time. It also appears, as expected, that there is a significant increase in in the winter months in comparison to the rest of the year.
Below, I present mostly just the numbers from the other guilds.
Since we just worked on the waterfowl model above, let’s start there and begin by plottin the data.
Recall our model glmernb.fowl2 from directly above.
Similar to before, we can once again calculate the R^2 values. This time we’ll show it in neat table.
| R2m | R2c | |
|---|---|---|
| delta | 0.2931394 | 0.3782012 |
| lognormal | 0.3952562 | 0.5099498 |
| trigamma | 0.1673828 | 0.2159532 |
Once again, it’s looking like we’ve got a pretty decent looking model.
Let’s check the residuals
And now let’s look at the model predctions. Using the estimate of our model, we can say there has been a decline of 30.8533103% over the seasons and an increase of 365.7134803% in the winter.
Similar to the shorebirds, we can see a similar difference in that the decline seems much more stark in the winter.
Check the data
Make the model
glmernb.gulls <- glmer.nb(gulls_terns ~ temporada + season + (1|obs),
data=guild_totals)
summary(glmernb.gulls)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.1854) ( log )
## Formula: gulls_terns ~ temporada + season + (1 | obs)
## Data: guild_totals
##
## AIC BIC logLik deviance df.resid
## 1805.0 1818.6 -897.5 1795.0 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.0678 -0.7161 -0.2335 0.6357 4.2453
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.08125 0.285
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.34682 0.98663 7.446 9.59e-14 ***
## temporada -0.04662 0.05946 -0.784 0.43309
## season2 0.57935 0.19179 3.021 0.00252 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.988
## season2 0.028 -0.105
## $residDev
## [1] 94.99911
##
## $residDF
## [1] 109
##
## $ratio
## [1] 0.8715514
##
## $P.ChiSq
## [1] 0.8280607
Once again, it looks likee there is a significant decline in the abundance of gulls/terems/skimmers over time as well as a significant increease in the winter due to the influx of migrants. From the model estimates, there is a decline of 4.554998% of the in gulls/terns/skimmers guild. While this isn’t as exaggerated as the other guilds, it is still something worth noting.
Calculate R^2 values
| R2m | R2c | |
|---|---|---|
| delta | 0.0851377 | 0.1654172 |
| lognormal | 0.1105094 | 0.2147128 |
| trigamma | 0.0590585 | 0.1147470 |
Review the residuals
Plot the model predictions
Here we can visualize that the decline is not as steep as the other guilds, but still present. We can also once again see the notable differences between the winter and non-winter seasons.
Plot the data
Making the model:
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## unable to evaluate scaled gradient
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge: degenerate Hessian with 1 negative eigenvalues
ss <- getME(glmernb.pel,c("theta","fixef"))
pelmod <- update(glmernb.pel,start=ss,control=glmerControl(optCtrl=list(maxfun=2e4)))
summary(pelmod)## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.4891) ( log )
## Formula: pel_corm ~ temporada + season + (1 | obs)
## Data: guild_totals
## Control: glmerControl(optCtrl = list(maxfun = 20000))
##
## AIC BIC logLik deviance df.resid
## 1433.4 1447.0 -711.7 1423.4 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.1437 -0.6468 -0.3666 0.2948 4.0194
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.4335 0.6584
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.42046 0.20036 42.026 < 2e-16 ***
## temporada -0.22764 0.01202 -18.942 < 2e-16 ***
## season2 0.87525 0.20483 4.273 1.93e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.510
## season2 -0.282 -0.280
## $residDev
## [1] 114.2199
##
## $residDF
## [1] 109
##
## $ratio
## [1] 1.047889
##
## $P.ChiSq
## [1] 0.3471449
Pelicans/cormorants/allies guild is declining by 20.3589084% each year.
Calculate R^2 values
| R2m | R2c | |
|---|---|---|
| delta | 0.2305754 | 0.5309768 |
| lognormal | 0.2593109 | 0.5971499 |
| trigamma | 0.1934467 | 0.4454756 |
Plot model predictions Something to note here is the axis. If you look at the boxplots above, you’ll see there are two very high counts (ploted as dots aka outliers) in the 2013-2014 season. Now that can very obviously be skewing the data, but take a second to look at the axis of our graphed model predictions and you’ll see that it’s much smaller.
The only guild that lacks a final model
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00645303 (tol = 0.002, component 1)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(2.6765) ( log )
## Formula: waders ~ temporada + season + (1 | obs)
## Data: guild_totals
##
## AIC BIC logLik deviance df.resid
## 1175.5 1189.1 -582.8 1165.5 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3878 -0.6319 -0.1972 0.4254 3.2289
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2503 0.5003
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.02105 1.05489 5.708 1.14e-08 ***
## temporada -0.11036 0.06322 -1.746 0.0809 .
## season2 -0.06425 0.14351 -0.448 0.6543
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.993
## season2 -0.038 -0.018
## $residDev
## [1] 95.30029
##
## $residDF
## [1] 109
##
## $ratio
## [1] 0.8743146
##
## $P.ChiSq
## [1] 0.822308
As you can see, this model failed to onverge and I haven’t worked on getting a better fit yet
We can put a summary of all the important parts of each model into a single table. Note that here the “estimates” of each variable have been back transformed. For example, in the shorebird model, rather than display the estimate of “-0.21397” for temporada (aka the year), the table will show the back transformed intercept (recall that for a negative binomial model we have to use the exp() function) so the number displayed is the incidence rate ratio of 0.8073726. This simply means that as that variable increases by 1 unit, the response variable is multipled by a factor of 0.8073726. Also note that the R^2 values displayed are calculated using the log-normal method that was shown in the tables above.
tab_model(glmernb.shore1, glmernb.fowl2, glmernb.gulls, pelmod,
dv.labels = c("Shorebirds", "Waterfowl", "Gulls/Terns/Skimmers",
"Pelicans/Cormorants/Allies"),
string.ci = "Conf. Int (95%)",
string.p = "P-Value", pred.labels = c("Intercept", "Year", "Season"))| Shorebirds | Waterfowl | Gulls/Terns/Skimmers | Pelicans/Cormorants/Allies | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predictors | Incidence Rate Ratios | Conf. Int (95%) | P-Value | Incidence Rate Ratios | Conf. Int (95%) | P-Value | Incidence Rate Ratios | Conf. Int (95%) | P-Value | Incidence Rate Ratios | Conf. Int (95%) | P-Value |
| Intercept | 31436.98 | 2794.32 – 353676.27 | <0.001 | 13960.80 | 4726.26 – 41238.53 | <0.001 | 1551.26 | 224.32 – 10727.61 | <0.001 | 4538.97 | 3064.83 – 6722.15 | <0.001 |
| Year | 0.81 | 0.70 – 0.93 | 0.004 | 0.69 | 0.64 – 0.74 | <0.001 | 0.95 | 0.85 – 1.07 | 0.433 | 0.80 | 0.78 – 0.82 | <0.001 |
| Season | 2.98 | 2.06 – 4.32 | <0.001 | 3.66 | 2.01 – 6.65 | <0.001 | 1.78 | 1.23 – 2.60 | 0.003 | 2.40 | 1.61 – 3.58 | <0.001 |
| Random Effects | ||||||||||||
| σ2 | 0.55 | 0.99 | 0.61 | 0.52 | ||||||||
| τ00 | 0.32 obs | 0.23 obs | 0.08 obs | 0.43 obs | ||||||||
| ICC | 0.36 | 0.19 | 0.12 | 0.46 | ||||||||
| N | 42 obs | 42 obs | 42 obs | 42 obs | ||||||||
| Observations | 113 | 113 | 113 | 113 | ||||||||
| Marginal R2 / Conditional R2 | 0.321 / 0.568 | 0.395 / 0.510 | 0.111 / 0.215 | 0.259 / 0.597 | ||||||||
Another important grouping to make in the bird is between the migrantory and resident species for many reasons. One interesting reason just based off the guild muilds is that when looking at the plotted model predictions, it seemed that in general, the decrease in birds over time was more pronounced in the winter season.
mig <- read_csv("mig_res_totals.csv", col_types = cols(Date = col_date(format = "%m/%d/%y")))
mig <- cbind(mig, a) ##add enviro vars
mig <- cbind(mig, mes) ##add month
mig <- cbind(mig, mes.yr) ##add year
mig <- filter(mig, szn != "4") ##remove summer
mig$season <- ifelse(mig$szn != 2, 1, 2) ##combine spring/fall
colnames(mig) ##check that data combined and is complete## [1] "Date" "migrants" "residents" "Cloud" "Temp"
## [6] "Precip" "Wind" "temporada" "szn" "obs"
## [11] "tide_height" "tide_dir" "mes" "mes.yr" "season"
Start with migrants. Plot data:
Make the model
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(1.9745) ( log )
## Formula: migrants ~ temporada + season + (1 | obs)
## Data: mig
##
## AIC BIC logLik deviance df.resid
## 1807.2 1820.8 -898.6 1797.2 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3566 -0.5788 -0.1004 0.4177 2.9619
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.2397 0.4896
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.90685 1.09963 7.190 6.46e-13 ***
## temporada -0.13459 0.06464 -2.082 0.0373 *
## season 0.78086 0.15729 4.965 6.89e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.973
## season -0.192 -0.013
12.5925791% decrease over the years 218.3349139% higher in the winter
Plot the residuals
Calculate R^2 values
| R2m | R2c | |
|---|---|---|
| delta | 0.2065678 | 0.4611181 |
| lognormal | 0.2302980 | 0.5140908 |
| trigamma | 0.1782550 | 0.3979160 |
Plot model predictions
First we’ll look at the effects of year on the model, with the other effects considereed constants.
And now compare that graph to the model, but dividing out the season as we
Repeat workflow Plot data:
Make the model
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: Negative Binomial(2.2867) ( log )
## Formula: residents ~ temporada + season + (1 | obs)
## Data: mig
##
## AIC BIC logLik deviance df.resid
## 1669.8 1683.4 -829.9 1659.8 108
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.3347 -0.6801 -0.2256 0.4741 3.0598
##
## Random effects:
## Groups Name Variance Std.Dev.
## obs (Intercept) 0.3366 0.5802
## Number of obs: 113, groups: obs, 42
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 5.94511 1.20565 4.931 8.18e-07 ***
## temporada -0.05652 0.07153 -0.790 0.429
## season 0.82429 0.15936 5.173 2.31e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) temprd
## temporada -0.978
## season -0.152 -0.034
Residents, although decreasing it’s not significant. While they aren’t necessarily increasing either, this means that the birds are probably fluctuating year to year but overall are not changing. This is good!! Also, similar to all other groups, the numbers of residents increases 228.0261205% in the winter months.
Plot the residuals
Calculate R^2 values
| R2m | R2c | |
|---|---|---|
| delta | 0.1814263 | 0.5366354 |
| lognormal | 0.1970367 | 0.5828090 |
| trigamma | 0.1625537 | 0.4808129 |
Plot model predictions As we can see in the model prediction graphs above- our confidence intervals are very large. And though the trend is a decrease over time, there is not a lot to support that statement. So in looking at these graphs, we can see how much overlap there is in the points and confidence intervals.
Repeating what we did, we can create a model summary table for our two. And once again, what were the intercepts of the model have been back transformed into incidence rate ratios.
tab_model(mig.mod1, res.mod1,
dv.labels = c("Migrants", "Residents"),
string.ci = "Conf. Int (95%)",
string.p = "P-Value", pred.labels = c("Intercept", "Year", "Season"))| Migrants | Residents | |||||
|---|---|---|---|---|---|---|
| Predictors | Incidence Rate Ratios | Conf. Int (95%) | P-Value | Incidence Rate Ratios | Conf. Int (95%) | P-Value |
| Intercept | 2715.83 | 314.70 – 23437.40 | <0.001 | 381.88 | 35.95 – 4056.77 | <0.001 |
| Year | 0.87 | 0.77 – 0.99 | 0.037 | 0.95 | 0.82 – 1.09 | 0.429 |
| Season | 2.18 | 1.60 – 2.97 | <0.001 | 2.28 | 1.67 – 3.12 | <0.001 |
| Random Effects | ||||||
| σ2 | 0.41 | 0.36 | ||||
| τ00 | 0.24 obs | 0.34 obs | ||||
| ICC | 0.37 | 0.48 | ||||
| N | 42 obs | 42 obs | ||||
| Observations | 113 | 113 | ||||
| Marginal R2 / Conditional R2 | 0.230 / 0.514 | 0.197 / 0.583 | ||||
This is a clean way to summarize some stuff.
Given that most of the guilds seem to be trending downward. Let’s take a look once again at the predicted values of the models for each guild. In each of the models, the starting point was much higher in the winter and seemingly decreased more notably.
One reason I had moved from working with biodiversity indices to working on this modeling project was because it was clear that important aspect of the survey data was missing. For the calculating the biodiversity indices for example, birds had to be identified to species. This completely glosses over the impact of larger aggregations of birds that are identified to a group but not necessarily species (ie estimations of “1000 peeps”). In separting the birds into guilds, this important aspect of data that can be captured.
gm1 <- ggplotGrob(plot(pr.mig1, show.title=FALSE, show.legend=TRUE, color="ipsum"))
gre1 <- ggplotGrob(plot(pr.res, show.title=FALSE, show.legend=TRUE, color="ipsum"))
grid.arrange(gm1, gre1)So in looking at our model, it doesn’t seem as if residents are changing but migrants are. This might suggest that the overall declines across the bird guilds are unequally attributed to declines in migrants overwintering. This could be a big jump to make, but it’s something that seemingly makes sense. BUT ALSO with the separation between migrant/resident spp, I did not include these unidentified birds. Despite this, we still saw some interesting changes and that migrants are following these trends to be declinging.
But finally, a big thing to keep in mind is that there is a lot of variability in all our data over time so all results should be taken with a grain of salt.
While this was fun, the work isn’t done. For example, there are several other cool questions that can be answered. Part of the problem is the surveys lack a consistent protocol. For example, I do now know how people in 2015 measured the tide or temperature. Gathering better data on historic weather/climate conditions would be interesting to look at changes or potential correlations between temp and bird relative abundances. Though we take data on the temperature during each monitoring effort, examining climate from a wider lens would be interesting. Precipitation patterns or extreme temperature events could potentially be affecting the phenology of migration in some birds that may be using temp/precip patterns as environmental cues. While some temperature and precipitation data are available here, I found the data in the area to be incomplete and lacking the last few years. Another interesting direction would be to relate the abundances of shorebirds (and/or other guilds) to tides. While we take data on the tides, it would be possible to more accurately characterize the tide.